Goto

Collaborating Authors

 obfuscated gradient


Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Ma, Xingjun, Jiang, Linxi, Huang, Hanxun, Weng, Zejia, Bailey, James, Jiang, Yu-Gang

arXiv.org Artificial Intelligence

Evaluating the robustness of a defense model is a challenging task in adversarial robustness research. Obfuscated gradients have previously been found to exist in many defense methods and cause a false signal of robustness. In this paper, we identify a more subtle situation called Imbalanced Gradients that can also cause overestimated adversarial robustness. The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards to a suboptimal direction. To exploit imbalanced gradients, we formulate a Margin Decomposition (MD) attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately via a two-stage process. We also propose a multi-targeted and ensemble version of our MD attack. By investigating 24 defense models proposed since 2018, we find that 11 models are susceptible to a certain degree of imbalanced gradients and our MD attack can decrease their robustness evaluated by the best standalone baseline attack by more than 1%. We also provide an in-depth investigation on the likely causes of imbalanced gradients and effective countermeasures.


ICML 2018 Announces Best Paper Awards – SyncedReview – Medium

#artificialintelligence

The International Conference on Machine Learning (ICML) 2018 will be held July 10–15 in Stockholm, Sweden. Yesterday, from more than 600 accepted papers, the prestigious conference announced its Best Paper Awards. Two papers shared top honours. Researchers Anish Athalye of MIT and Nicholas Carlini and David Wagner of UC Berkeley's Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples; and Delayed Impact of Fair Machine Learning, from a UC Berkeley research group led by Lydia T. Liu and Sarah Dean. The Best Paper Runner Up Awards go to Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices, from Professor Zengfeng Huang of Fudan University; The Mechanics of n-Player Differentiable Games from DeepMind and University of Oxford's David Balduzzi and Sebastien Racaiere, James Martens, Jakob Foerster, Karl Tuyls and Thore Graepel; and Fairness Without Demographics in Repeated Loss Minimization, from a Stanford research group including Tatsunori B. Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang.


obfuscated-gradients

#artificialintelligence

Above is an adversarial example: the slightly perturbed image of the cat fools an InceptionV3 classifier into classifying it as "guacamole". Such "fooling images" are easy to synthesize using gradient descent (Szegedy et al. 2013). In our recent paper, we evaluate the robustness of eight papers accepted to ICLR 2018 as non-certified white-box-secure defenses to adversarial examples. We find that seven of the eight defenses provide a limited increase in robustness and can be broken by improved attack techniques we develop. See our paper, Section 5 for full numbers.)


Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

Athalye, Anish, Carlini, Nicholas, Wagner, David

arXiv.org Artificial Intelligence

We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented. For each of the three types of obfuscated gradients we discover, we describe characteristic behaviors of defenses exhibiting this effect and develop attack techniques to overcome it. In a case study, examining non-certified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 8 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely and 1 partially.


anishathalye/obfuscated-gradients

@machinelearnbot

Above is an adversarial example: the slightly perturbed image of the cat fools an InceptionV3 classifier into classifying it as "guacamole". Such "fooling images" are easy to synthesize using gradient descent (Szegedy et al. 2013). In our recent paper, we evaluate the robustness of eight papers accepted to ICLR 2018 as defenses to adversarial examples. We find that seven of the eight defenses provide a limited increase in robustness and can be broken by improved attack techniques we develop. The only defense we observe that significantly increases robustness to adversarial examples within the threat model proposed is "Towards Deep Learning Models Resistant to Adversarial Attacks" (Madry et al. 2018), and we were unable to defeat this defense without stepping outside the threat model.